Wikipedia:Bots/Requests for approval/Magic links bot
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: JJMC89 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 17:41, Wednesday, March 29, 2017 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): Python
Source code available: Pending
Function overview: Replace magic links with templates per RfC
Links to relevant discussions (where appropriate): RfC
Edit period(s): Daily
Estimated number of pages affected: 500k+
Exclusion compliant: Yes
Already has a bot flag: No
Function details: Replace magic links with templates per RfC
Regexes are based on doMagicLinks.
- ISBN
- Find:
\b(ISBN)((?:[^\S\n]| |&\#0*160;|&\#[Xx]0*[Aa]0;)+)((?:97[89](?:-|(?:[^\S\n]| |&\#0*160;|&\#[Xx]0*[Aa]0;))?)?(?:[0-9](?:-|(?:[^\S\n]| |&\#0*160;|&\#[Xx]0*[Aa]0;))?){9}[0-9Xx])\b
(Simplified:\b(ISBN)({spaces})((?:97[89]{spaceDash}?)?(?:[0-9]{spaceDash}?){9}[0-9Xx])\b
) - Replace:
{{ISBN|\3}}
- Find:
- PMID
- Find:
\b(PMID)((?:[^\S\n]| |&\#0*160;|&\#[Xx]0*[Aa]0;)+)([0-9]+)\b
(Simplified:\b(PMID)({spaces})([0-9]+)\b
) - Replace:
{{PMID|\3}}
- Find:
RFCFind:\b(RFC)((?:[^\S\n]| |&\#0*160;|&\#[Xx]0*[Aa]0;)+)([0-9]+)\b
(Simplified:\b(RFC)({spaces})([0-9]+)\b
)Replace:{{IETF RFC|\3}}
The following are excluded during replacement:
- HTML comments, section headers, wikilinks, interwiki links, #property, #invoke, categories, files
- anything inside gallery, math, nowiki, pre, source, score, or syntaxhighlight tags
- all HTML tags and attributes
- all URLs (with or without brackets), including linked text
Unless community consensus is established to add the templates on an ongoing basis, this task will end when magic links functionality is disabled.
Discussion
[edit]See Wikipedia:Village_pump_(proposals)#Magic_link_RFC_follow_up as well as
- Wikipedia:Bots/Requests_for_approval/PrimeBOT_13
- Wikipedia:Bots/Requests for approval/CitationCleanerBot 2
- Wikipedia:Bots/Requests_for_approval/Yobot_54
Headbomb {t · c · p · b} 18:09, 29 March 2017 (UTC)[reply]
- With these being complimentary / redundant to the bots listed above - have you coordinated efforts to ensure these produce similar results? — xaosflux Talk 23:34, 29 March 2017 (UTC)[reply]
- Anything those bots do outside of converting ISBN and PMID magic links to templates is outside the scope of this task. The regexps for this task are equivalent to those provided by Anomie for PrimeBOT 13. Yobot 54 and CitationCleanerBot 2 should be using the same regexps as PrimtBOT 13. — JJMC89 (T·C) 01:23, 30 March 2017 (UTC)[reply]
Please see my note at the end of the PrimeBOT BRFA about RFC conversions. I think they should be done with human oversight, since there are too many false positives in the category. Also, the bot should operate only on pages that are in one of the magic links categories in order to limit false positives. – Jonesey95 (talk) 00:31, 30 March 2017 (UTC)[reply]
- RFC removed. — JJMC89 (T·C) 01:23, 30 March 2017 (UTC)[reply]
- RFC should be fine if you keep it to RFC > 100 Headbomb {t · c · p · b} 02:09, 30 March 2017 (UTC)[reply]
- Not really. – Jonesey95 (talk) 15:03, 30 March 2017 (UTC)[reply]
- I took a look through some RFC magic links. Most of them should not be replaced with
{{IETF RFC}}
, regardless of number. Any used as a reference should use{{cite IETF}}
. Any in|title=
or|work=
should not be links. There are also many inline RFC magic links, which goes against WP:EL. — JJMC89 (T·C) 15:31, 30 March 2017 (UTC)[reply]
- I took a look through some RFC magic links. Most of them should not be replaced with
- Not really. – Jonesey95 (talk) 15:03, 30 March 2017 (UTC)[reply]
- RFC should be fine if you keep it to RFC > 100 Headbomb {t · c · p · b} 02:09, 30 March 2017 (UTC)[reply]
How will the regex above (thanks for providing it, BTW) deal with the edge cases found within URL links at Wikipedia:WikiProject Check Wikipedia/ISBN errors, which overlaps substantially with Wikipedia:CHECKWIKI/WPC 069 dump? Will those links need to be cleaned up first? There are only about 600 of them, down from a few thousand before we fixed the easy ones. – Jonesey95 (talk) 15:06, 30 March 2017 (UTC)[reply]
- The provided regexps are only for the replacement itself. URLs are excluded from replacement. (See function details.) — JJMC89 (T·C) 15:31, 30 March 2017 (UTC)[reply]
- Very good. I did not understand "with or without brackets". Thanks. – Jonesey95 (talk) 17:09, 30 March 2017 (UTC)[reply]
- Just out of curiosity, is everyone with a bot going to submit a BRFA to do this task? Primefac (talk) 16:12, 2 April 2017 (UTC)[reply]
- I wasn't going to; however, CBM requested it. — JJMC89 (T·C) 17:26, 3 April 2017 (UTC)[reply]
- Gotcha. 500k edits is rather huge. Primefac (talk) 17:30, 3 April 2017 (UTC)[reply]
We could have a race!!! --MZMcBride (talk) 02:55, 4 April 2017 (UTC)[reply]
How do we move forward here? Are we waiting on approval for a trial? Legoktm (talk) 19:55, 9 April 2017 (UTC)[reply]
- If everyone's using the same regex, then technically speaking trials have already been run. With three (four?) requests for the same task, it might just be a question of literally splitting up the list to avoid unnecessary server loads. Primefac (talk) 20:19, 9 April 2017 (UTC)[reply]
- What list are you planning to use? If it's the tracking categories, I don't think there should be much trouble with multiple bots going through it if they start at different points within the category.
- {{BAGAssistanceNeeded}} to determine whether a trial for this bot is needed or if it can proceed with approval since there hasn't been any further comments in the past few days. Legoktm (talk) 04:36, 10 April 2017 (UTC)[reply]
- @JJMC89: Would you mind adding code to also do other common identifiers, as a side benefit? PMID and DOI specifically? Headbomb {t · c · p · b} 19:27, 17 April 2017 (UTC)[reply]
- Since those are not magic links, it is outside the scope of this task.
Anything [...] outside of converting ISBN and PMID magic links to templates is outside the scope of this task.
— JJMC89 (T·C) 19:49, 17 April 2017 (UTC)[reply]- Yes, but I'm asking if you'd enlarge it to perform other fixes on top of just magic link conversions, since other bots will likely be doing those as well. It would save a lot of edits. Headbomb {t · c · p · b} 19:51, 17 April 2017 (UTC)[reply]
- Community consensus has not been established for automatically templating
PMIDPMC and DOI. If consensus is established, I can file another BRFA to extend this task. — JJMC89 (T·C) 01:59, 18 April 2017 (UTC)[reply]- Notices were posted on the VP, no one objected, those who commented thought it was a good idea. The reason I ask for this is because there's no world where doi:10.1016/j.coi.2004.08.001 is prefered over doi:10.1016/j.coi.2004.08.001, and seeing a citation with an unlinked doi:10.1016/j.coi.2004.08.001 + linked PMID 123456 is just plain confusing. Headbomb {t · c · p · b} 02:05, 18 April 2017 (UTC)[reply]
- That's entirely unrelated and shouldn't be part of this BRFA. If an editor leaves a unlinked doi then it will appear unlinked, while the PMID will get magic linked by MediaWiki. The point of this is to stop relying on that MediaWiki functionality as it will go away eventually.
- Also, I don't understand what you mean by "It would save a lot of edits." There's no limit on how many edits Wikipedia has left. Legoktm (talk) 02:19, 18 April 2017 (UTC)[reply]
- Legoktm, the argument re: saving edits is to avoid redundancy. Lets say 100k articles have non-template PMID, ISBN, and dois. If PrimeBOT does the PMID, Magic links bot does the ISBN, and Yobot does the dois, that's 300k edits. On the other hand, if one bot does all three, that's only 100k edits. I don't think it's necessarily about the edit count, though, but about clogging up watchlists (which people seem to complain about a fair amount when it comes to bots). In my hypothetical scenario, one page would get three minor edits (which annoys people for some reason). Both JJMC89 and Headbomb have made valid arguments why it shouldn't/should (respectively) be added to the BRFA, so I'm more or less apathetic regarding the specific outcome (as you'll probably notice on PrimeBOT's own magic links BRFA), but I think it might be what's holding up the process (that, and some odd regex). Primefac (talk) 02:26, 18 April 2017 (UTC)[reply]
- The argument mostly is what Primefac said. DOI/PMC links are the most common after IBSN/PMID (doi may be even more common). These bots will perform ~500 k edits across the wiki. Those edits should be as densely packed with fixes as possible, so we don't have 4-5 bots doing essentially the same thing one for ISBN/PMID, then one for DOI, then one for PMC, then one for... I want to develop the logic for all identifiers in Wikipedia:Bots/Requests for approval/CitationCleanerBot 2, but that will take time to hammer out all the corner cases for all identifiers, some of which are seldom used. If the magic link bots can tackle the DOI/PMCs as a secondary task (meaning they focus on the magic link categories, but also perform other task if their primary one would warrant an edit), that will take care of 90-95% of the overlap. With the regex for the other identifiers being incorporated over time as it gets developed and refined. Headbomb {t · c · p · b} 02:35, 18 April 2017 (UTC)[reply]
- "Lets say 100k articles have non-template PMID, ISBN, and dois" - that's super hypothetical. I highly doubt there are 100k pages with that overlap. Have you checked to see what the real number is?
- Mostly I'm utterly confused as to where DOI came from. It's not a MediaWiki magic link and seems entirely unrelated. Legoktm (talk) 03:26, 18 April 2017 (UTC)[reply]
- It's the same general idea that identifier should produce links. There's no reason to treat any identifier any differently than any other. Why link PMID not DOI? That's the entire reason why magic links are being deprecated: so that they are all treated on an equal footing.
- Running some scans on the last Database dump, there are roughly 15000 articles with untemplated DOIs, bare PMC usage is much lower than I though it would be, however, around 500, although my regex certainly doesn't catch all instances right now. Headbomb {t · c · p · b} 11:25, 18 April 2017 (UTC)[reply]
- Legoktm, the argument re: saving edits is to avoid redundancy. Lets say 100k articles have non-template PMID, ISBN, and dois. If PrimeBOT does the PMID, Magic links bot does the ISBN, and Yobot does the dois, that's 300k edits. On the other hand, if one bot does all three, that's only 100k edits. I don't think it's necessarily about the edit count, though, but about clogging up watchlists (which people seem to complain about a fair amount when it comes to bots). In my hypothetical scenario, one page would get three minor edits (which annoys people for some reason). Both JJMC89 and Headbomb have made valid arguments why it shouldn't/should (respectively) be added to the BRFA, so I'm more or less apathetic regarding the specific outcome (as you'll probably notice on PrimeBOT's own magic links BRFA), but I think it might be what's holding up the process (that, and some odd regex). Primefac (talk) 02:26, 18 April 2017 (UTC)[reply]
- Notices were posted on the VP, no one objected, those who commented thought it was a good idea. The reason I ask for this is because there's no world where doi:10.1016/j.coi.2004.08.001 is prefered over doi:10.1016/j.coi.2004.08.001, and seeing a citation with an unlinked doi:10.1016/j.coi.2004.08.001 + linked PMID 123456 is just plain confusing. Headbomb {t · c · p · b} 02:05, 18 April 2017 (UTC)[reply]
- Community consensus has not been established for automatically templating
- Yes, but I'm asking if you'd enlarge it to perform other fixes on top of just magic link conversions, since other bots will likely be doing those as well. It would save a lot of edits. Headbomb {t · c · p · b} 19:51, 17 April 2017 (UTC)[reply]
- Since those are not magic links, it is outside the scope of this task.
- Note I've started a conversation linking all three magic-link bots together here. Primefac (talk) 03:05, 18 April 2017 (UTC)[reply]
- {{BAG assistance needed}} The above discussion has died out, so let's move on. — JJMC89 (T·C) 22:32, 7 May 2017 (UTC)[reply]
I've read this BRFA, and the many... many... many other discussions tied to it. It does appear that there is consensus for the task. It appears that the task is technically sound. I understand that similar bots have already run trials for the same task. As those bots run on a different framework, I would want to make 100% sure that this one functions as described. Approved for trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. SQLQuery me! 18:20, 18 May 2017 (UTC)[reply]
- Trial complete. 100 edits. — JJMC89 (T·C) 02:16, 19 May 2017 (UTC)[reply]
- I checked all 100 of these edits and found no errors. In each case, all of the ISBNs and PMIDs that were placing the page in the ISBN or PMID magic links categories had been converted to templates, thereby removing the pages from the tracking categories. That means that the bot is achieving its goal.
- The bot's work also unearthed some errors, for example in 2nd Kent Artillery Volunteers, where the page was placed in Category:Pages with ISBN errors due to an invalid ISBN that was otherwise silently failing. That is a nice improvement as well. – Jonesey95 (talk) 04:10, 19 May 2017 (UTC)[reply]
- I don't see any errors either. I would like to give others an opportunity to review these edits as well. I'm leaning towards approving this request after 48 hours if there are no concerns. SQLQuery me! 04:30, 19 May 2017 (UTC)[reply]
- Looks good to me as well. The only thing I would suggest is if we could add a link to this BRFA or the MediaWiki RfC (or somewhere else?) in the edit summary in case people are confused. Legoktm (talk) 09:44, 19 May 2017 (UTC)[reply]
- I too would like a better edit summary, with both a link to the BRFA or RFC, with an explicit link on where to report issues in the edit summary. A slightly improved bot user page on what do when an ISBN has an error in it would be nice too. It could be to point out to existing template documentation. For CS1/2 templates, there is Help:CS1 errors#Check .7Cisbn= value, but there is no equivalent thing in {{ISBN}}. Those should likely be developed (along with support for a
|ignore-isbn-error=true
). Headbomb {t · c · p · b} 11:23, 19 May 2017 (UTC)[reply]- Agree with most of those above, as this task is huge, improve the edit summary and the Bot userpage - as there are several of these, perhaps link to a Wikipedia: page explaining what is going on with magic links. — xaosflux Talk 03:57, 20 May 2017 (UTC)[reply]
- I've set the edit summary to
Replace [[:mw:Help:Magic links|magic links]] with templates per [[Special:Permalink/772743896#Future of magic links|local RfC]] and [[:mw:Requests for comment/Future of magic links|MediaWiki RfC]]
. If Help:Magic links were up to date, I would link there. There is no need for an explicit "report issues" link – the implicit place for such things is the bot's talk page. I'll create an enwiki userpage for the bot upon approval. — JJMC89 (T·C) 04:58, 20 May 2017 (UTC)[reply]- I have updated Help:Magic links. Other editors may have ideas about what else the page should say. – Jonesey95 (talk) 06:11, 20 May 2017 (UTC)[reply]
- I've set the edit summary to
- Agree with most of those above, as this task is huge, improve the edit summary and the Bot userpage - as there are several of these, perhaps link to a Wikipedia: page explaining what is going on with magic links. — xaosflux Talk 03:57, 20 May 2017 (UTC)[reply]
- I too would like a better edit summary, with both a link to the BRFA or RFC, with an explicit link on where to report issues in the edit summary. A slightly improved bot user page on what do when an ISBN has an error in it would be nice too. It could be to point out to existing template documentation. For CS1/2 templates, there is Help:CS1 errors#Check .7Cisbn= value, but there is no equivalent thing in {{ISBN}}. Those should likely be developed (along with support for a
- The bot's work also unearthed some errors, for example in 2nd Kent Artillery Volunteers, where the page was placed in Category:Pages with ISBN errors due to an invalid ISBN that was otherwise silently failing. That is a nice improvement as well. – Jonesey95 (talk) 04:10, 19 May 2017 (UTC)[reply]
Approved. - please create a userpage before launching this task. SQLQuery me! 03:34, 22 May 2017 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.